An Elegant Algorithm for the Construction of Suffix Arrays.
Identifieur interne : 001D49 ( Main/Exploration ); précédent : 001D48; suivant : 001D50An Elegant Algorithm for the Construction of Suffix Arrays.
Auteurs : Sanguthevar Rajasekaran [États-Unis] ; Marius Nicolae [États-Unis]Source :
- Journal of discrete algorithms (Amsterdam, Netherlands) [ 1570-8667 ] ; 2014.
Abstract
The suffix array is a data structure that finds numerous applications in string processing problems for both linguistic texts and biological data. It has been introduced as a memory efficient alternative for suffix trees. The suffix array consists of the sorted suffixes of a string. There are several linear time suffix array construction algorithms (SACAs) known in the literature. However, one of the fastest algorithms in practice has a worst case run time of O(n2). The problem of designing practically and theoretically efficient techniques remains open. In this paper we present an elegant algorithm for suffix array construction which takes linear time with high probability; the probability is on the space of all possible inputs. Our algorithm is one of the simplest of the known SACAs and it opens up a new dimension of suffix array construction that has not been explored until now. Our algorithm is easily parallelizable. We offer parallel implementations on various parallel models of computing. We prove a lemma on the ℓ-mers of a random string which might find independent applications. We also present another algorithm that utilizes the above algorithm. This algorithm is called RadixSA and has a worst case run time of O(n log n). RadixSA introduces an idea that may find independent applications as a speedup technique for other SACAs. An empirical comparison of RadixSA with other algorithms on various datasets reveals that our algorithm is one of the fastest algorithms to date. The C++ source code is freely available at http://www.engr.uconn.edu/~man09004/radixSA.zip.
DOI: 10.1016/j.jda.2014.03.001
PubMed: 25045344
Affiliations:
Links toward previous steps (curation, corpus...)
- to stream PubMed, to step Corpus: 001905
- to stream PubMed, to step Curation: 001905
- to stream PubMed, to step Checkpoint: 001A01
- to stream Ncbi, to step Merge: 000E43
- to stream Ncbi, to step Curation: 000E43
- to stream Ncbi, to step Checkpoint: 000E43
- to stream Main, to step Merge: 001D64
- to stream Main, to step Curation: 001D49
Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">An Elegant Algorithm for the Construction of Suffix Arrays.</title>
<author><name sortKey="Rajasekaran, Sanguthevar" sort="Rajasekaran, Sanguthevar" uniqKey="Rajasekaran S" first="Sanguthevar" last="Rajasekaran">Sanguthevar Rajasekaran</name>
<affiliation wicri:level="2"><nlm:affiliation>Dept. of Computer Science and Engineering, Univ. of Connecticut, Storrs, CT, USA.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Dept. of Computer Science and Engineering, Univ. of Connecticut, Storrs, CT</wicri:regionArea>
<placeName><region type="state">Connecticut</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Nicolae, Marius" sort="Nicolae, Marius" uniqKey="Nicolae M" first="Marius" last="Nicolae">Marius Nicolae</name>
<affiliation wicri:level="2"><nlm:affiliation>Dept. of Computer Science and Engineering, Univ. of Connecticut, Storrs, CT, USA.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Dept. of Computer Science and Engineering, Univ. of Connecticut, Storrs, CT</wicri:regionArea>
<placeName><region type="state">Connecticut</region>
</placeName>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PubMed</idno>
<date when="2014">2014</date>
<idno type="RBID">pubmed:25045344</idno>
<idno type="pmid">25045344</idno>
<idno type="doi">10.1016/j.jda.2014.03.001</idno>
<idno type="wicri:Area/PubMed/Corpus">001905</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">001905</idno>
<idno type="wicri:Area/PubMed/Curation">001905</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Curation">001905</idno>
<idno type="wicri:Area/PubMed/Checkpoint">001A01</idno>
<idno type="wicri:explorRef" wicri:stream="Checkpoint" wicri:step="PubMed">001A01</idno>
<idno type="wicri:Area/Ncbi/Merge">000E43</idno>
<idno type="wicri:Area/Ncbi/Curation">000E43</idno>
<idno type="wicri:Area/Ncbi/Checkpoint">000E43</idno>
<idno type="wicri:doubleKey">1570-8667:2014:Rajasekaran S:an:elegant:algorithm</idno>
<idno type="wicri:Area/Main/Merge">001D64</idno>
<idno type="wicri:Area/Main/Curation">001D49</idno>
<idno type="wicri:Area/Main/Exploration">001D49</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en">An Elegant Algorithm for the Construction of Suffix Arrays.</title>
<author><name sortKey="Rajasekaran, Sanguthevar" sort="Rajasekaran, Sanguthevar" uniqKey="Rajasekaran S" first="Sanguthevar" last="Rajasekaran">Sanguthevar Rajasekaran</name>
<affiliation wicri:level="2"><nlm:affiliation>Dept. of Computer Science and Engineering, Univ. of Connecticut, Storrs, CT, USA.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Dept. of Computer Science and Engineering, Univ. of Connecticut, Storrs, CT</wicri:regionArea>
<placeName><region type="state">Connecticut</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Nicolae, Marius" sort="Nicolae, Marius" uniqKey="Nicolae M" first="Marius" last="Nicolae">Marius Nicolae</name>
<affiliation wicri:level="2"><nlm:affiliation>Dept. of Computer Science and Engineering, Univ. of Connecticut, Storrs, CT, USA.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Dept. of Computer Science and Engineering, Univ. of Connecticut, Storrs, CT</wicri:regionArea>
<placeName><region type="state">Connecticut</region>
</placeName>
</affiliation>
</author>
</analytic>
<series><title level="j">Journal of discrete algorithms (Amsterdam, Netherlands)</title>
<idno type="ISSN">1570-8667</idno>
<imprint><date when="2014" type="published">2014</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass></textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">The suffix array is a data structure that finds numerous applications in string processing problems for both linguistic texts and biological data. It has been introduced as a memory efficient alternative for suffix trees. The suffix array consists of the sorted suffixes of a string. There are several linear time suffix array construction algorithms (SACAs) known in the literature. However, one of the fastest algorithms in practice has a worst case run time of <i>O</i>
(<i>n</i>
<sup>2</sup>
). The problem of designing practically and theoretically efficient techniques remains open. In this paper we present an elegant algorithm for suffix array construction which takes linear time with high probability; the probability is on the space of all possible inputs. Our algorithm is one of the simplest of the known SACAs and it opens up a new dimension of suffix array construction that has not been explored until now. Our algorithm is easily parallelizable. We offer parallel implementations on various parallel models of computing. We prove a lemma on the ℓ-mers of a random string which might find independent applications. We also present another algorithm that utilizes the above algorithm. This algorithm is called RadixSA and has a worst case run time of <i>O</i>
(<i>n</i>
log <i>n</i>
). RadixSA introduces an idea that may find independent applications as a speedup technique for other SACAs. An empirical comparison of RadixSA with other algorithms on various datasets reveals that our algorithm is one of the fastest algorithms to date. The C++ source code is freely available at http://www.engr.uconn.edu/~man09004/radixSA.zip.</div>
</front>
</TEI>
<affiliations><list><country><li>États-Unis</li>
</country>
<region><li>Connecticut</li>
</region>
</list>
<tree><country name="États-Unis"><region name="Connecticut"><name sortKey="Rajasekaran, Sanguthevar" sort="Rajasekaran, Sanguthevar" uniqKey="Rajasekaran S" first="Sanguthevar" last="Rajasekaran">Sanguthevar Rajasekaran</name>
</region>
<name sortKey="Nicolae, Marius" sort="Nicolae, Marius" uniqKey="Nicolae M" first="Marius" last="Nicolae">Marius Nicolae</name>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001D49 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 001D49 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Sante |area= MersV1 |flux= Main |étape= Exploration |type= RBID |clé= pubmed:25045344 |texte= An Elegant Algorithm for the Construction of Suffix Arrays. }}
Pour générer des pages wiki
HfdIndexSelect -h $EXPLOR_AREA/Data/Main/Exploration/RBID.i -Sk "pubmed:25045344" \ | HfdSelect -Kh $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd \ | NlmPubMed2Wicri -a MersV1
This area was generated with Dilib version V0.6.33. |